We describe a suite of validation metrics that assess the credibility of agiven automatic spike sorting algorithm applied to a given electrophysiologicalrecording, when ground-truth is unavailable. By rerunning the spike sorter twoor more times, the metrics measure stability under various perturbationsconsistent with variations in the data itself, making no assumptions about thenoise model, nor about the internal workings of the sorting algorithm. Suchstability is a prerequisite for reproducibility of results. We illustrate themetrics on standard sorting algorithms for both in vivo and ex vivo recordings.We believe that such metrics could reduce the significant human labor currentlyspent on validation, and should form an essential part of large-scale automatedspike sorting and systematic benchmarking of algorithms.
展开▼